Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subspace-core-primitives refactoring (part 2) #3088

Merged
merged 6 commits into from
Oct 4, 2024

Conversation

nazar-pc
Copy link
Member

@nazar-pc nazar-pc commented Oct 2, 2024

This is a continuation of #3087, it moves indirect blst dependency and other KZG-related functionality out of subspace-core-primitives into new subspace-kzg crate.

I believe this allows us to compile runtime without any C dependencies, improves compilation parallelism and will even allow us to publish this crate to crates.io since git dependencies are not allowed there and KZG library we're using is not published yet.

This should also improve runtime performance since we'll not have to pay for proper Scalar conversion to/from bytes anymore (in-memory representation of it is not just 32 bytes and not necessarily the same bytes as the source).

Changes here are mostly moving code around, renaming and tweaking dependencies.

Code contributor checklist:

Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some theoretical safety concerns which it would be good to document and test.

Did you mean to delete subspace-core-primitives/README.md, currently it looks like an empty file?

Comment on lines 271 to 297
pub fn vec_to_repr(value: Vec<Self>) -> Vec<FsFr> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut FsFr,
value.len(),
value.capacity(),
)
}
}

/// Convenient conversion from vector of underlying representation to scalar for efficiency
/// purposes.
pub fn vec_from_repr(value: Vec<FsFr>) -> Vec<Self> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Self,
value.len(),
value.capacity(),
)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there are some additional vector invariants:

Suggested change
pub fn vec_to_repr(value: Vec<Self>) -> Vec<FsFr> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut FsFr,
value.len(),
value.capacity(),
)
}
}
/// Convenient conversion from vector of underlying representation to scalar for efficiency
/// purposes.
pub fn vec_from_repr(value: Vec<FsFr>) -> Vec<Self> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Self,
value.len(),
value.capacity(),
)
}
}
pub fn vec_to_repr(value: Vec<Self>) -> Vec<FsFr> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped, allocating via original vector ensures other invariants
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut FsFr,
value.len(),
value.capacity(),
)
}
}
/// Convenient conversion from vector of underlying representation to scalar for efficiency
/// purposes.
pub fn vec_from_repr(value: Vec<FsFr>) -> Vec<Self> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped, allocating via original vector ensures other invariants
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Self,
value.len(),
value.capacity(),
)
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "allocating via original vector ensures other invariants" means and what invariants there are. Vector storing transparent struct is the same as vector of the internal contents of that struct. And we don't drop anything to cause use after free. What other invariants are there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector storing transparent struct is the same as vector of the internal contents of that struct.

Yes, this is true, as long as we use from_raw_parts() (because field order isn’t guaranteed), and the arguments are taken unmodified from another vector (or the item layouts are compatible).

Here’s the full list of invariants from the documentation for from_raw_parts():

Most of them are ensured by re-using the unmodified fields from another vector, which is why I suggested adding “allocating via original vector ensures other invariants”:

  • ptr must have been allocated using the global allocator, such as via the alloc::alloc function.
  • length needs to be less than or equal to capacity.
  • capacity needs to be the capacity that the pointer was allocated with.
  • The allocated size in bytes must be no larger than isize::MAX. See the safety documentation of pointer::offset.

And the rest are ensured if the layouts and valid values of the two item types are the same:

  • T needs to have the same alignment as what ptr was allocated with. (T having a less strict alignment is not sufficient, the alignment really needs to be equal to satisfy the dealloc requirement that memory must be allocated and deallocated with the same layout.)
  • The size of T times the capacity (ie. the allocated size in bytes) needs to be the same size as the pointer was allocated with. (Because similar to alignment, dealloc must be called with the same layout size.)
  • The first length values must be properly initialized values of type T.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like those are automatically implied by #[repr(transparent)]. We have a simple struct with that is #[repr(transparent)] with #[repr(C)] struct inside of it. Might be just me, but "allocating via original vector ensures other invariants" doesn't really provide any additional context to me. We clearly just re-construct vector from the same exact raw parts. If the contents is the same, then alignment and other things are obviously also the same by extension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, and my suggested wording is intended as a warning not to make any changes to any of the arguments to the unsafe function. For example, we can’t change the length or capacity, or copy the ptr in memory, because that could break their size, alignment, or allocator invariants.

As I said, this was a nitpick, so I’m not going to insist on any changes here.

Comment on lines 301 to 327
pub fn vec_option_to_repr(value: Vec<Option<Self>>) -> Vec<Option<FsFr>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<FsFr>,
value.len(),
value.capacity(),
)
}
}

/// Convenient conversion from vector of optional underlying representation to scalar for
/// efficiency purposes.
pub fn vec_option_from_repr(value: Vec<Option<FsFr>>) -> Vec<Option<Self>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<Self>,
value.len(),
value.capacity(),
)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, I think it is extremely unlikely that these type layouts will be different.

But there is no explicit guarantee that Option<TransparentWrapperOfT> has the same layout as Option<T>. The repr(transparent) guarantee only applies to the structs themselves. (Option is a specialised repr(Rust), and does not have layout guarantees in the absence of niches.)
https://rust-lang.github.io/unsafe-code-guidelines/layout/enums.html#layout-of-a-data-carrying-enums-without-a-repr-annotation

There is also no guarantee that the layout of a repr(Rust) is a deterministic function of the types and order of its fields. This non-guarantee is specifically documented for structs here:
https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust

And there are debugging compiler options that deliberately violate this assumption:
https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#unresolved-question-guaranteeing-compatible-layouts

However, in the absence of those compiler options, the documented current behaviour of minimising the struct size and padding could also be assumed to apply to enums. Particularly when the underlying type is a repr(C) plain old data array [u64; 4]:
https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#default-layout-repr-rust

Suggested change
pub fn vec_option_to_repr(value: Vec<Option<Self>>) -> Vec<Option<FsFr>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<FsFr>,
value.len(),
value.capacity(),
)
}
}
/// Convenient conversion from vector of optional underlying representation to scalar for
/// efficiency purposes.
pub fn vec_option_from_repr(value: Vec<Option<FsFr>>) -> Vec<Option<Self>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<Self>,
value.len(),
value.capacity(),
)
}
}
pub fn vec_option_to_repr(value: Vec<Option<Self>>) -> Vec<Option<FsFr>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, we assume the compiler lays out optional `repr(C)` plain old data arrays the
// same as their optional transparent wrappers, original vector is not dropped, allocating
// via original vector ensures other invariants
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<FsFr>,
value.len(),
value.capacity(),
)
}
}
/// Convenient conversion from vector of optional underlying representation to scalar for
/// efficiency purposes.
pub fn vec_option_from_repr(value: Vec<Option<FsFr>>) -> Vec<Option<Self>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, we assume the compiler lays out optional `repr(C)` plain old data arrays the
// same as their optional transparent wrappers, original vector is not dropped, allocating
// via original vector ensures other invariants
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<Self>,
value.len(),
value.capacity(),
)
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, in general case it is not guaranteed, for example Option<u8> and Option<NonZeroU8> are not symmetric, but in this case it'd be very surprising for invariant to not hold in practice, so I updated description of the safety assumptions as you described. It is possible to break with some exotic compiler options, but it is extremely unlikely to be an issue in practice.

Thanks for careful review!

Comment on lines +30 to +31
#[test]
fn bytes_scalars_conversion() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the safety assumption around the layout of Option<Scalar>, it would be good to have a specific test for Vec<Option<Scalar>> conversion (and the reverse) here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added constant assertions about size and alignment instead, let me know if that is what you meant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case, I agree that’s enough to detect all potential variations, including field order (which is odd, but specifically documented as something the compiler can do if there are different optimisation settings for different crates). Because the data field has alignment requirements, putting the discriminant before the data would add internal padding, and change the size of the option.

As a counter example, it wouldn’t work for Option<u8>, where we’d need a runtime test that the data bytes matched.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While what you described makes sense, it'd still be odd for compiler to put discriminant in one location for inner struct and another location for its transparent wrapper. I acknowledge it is technically allowed to do that, but still I'd expect it to be consistent. Thankfully we have a larger alignment here, so we're lucky I guess.

Interestingly, there should be two bits inside of this value that could have been used as a niche, which would allow us to have more control over Option behavior, but maybe at some point in the future 🙄

shared/subspace-kzg/src/lib.rs Outdated Show resolved Hide resolved
Base automatically changed from subspace-core-primitives-refactoring to main October 3, 2024 09:21
Copy link
Member Author

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some theoretical safety concerns which it would be good to document and test.

Improved docs around this, added static assertion about matching size and layout and sprinkled some #[inline] that were mysteriously missing on some methods.

Did you mean to delete subspace-core-primitives/README.md, currently it looks like an empty file?

It was an empty file, which is why I removed it.

Comment on lines 271 to 297
pub fn vec_to_repr(value: Vec<Self>) -> Vec<FsFr> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut FsFr,
value.len(),
value.capacity(),
)
}
}

/// Convenient conversion from vector of underlying representation to scalar for efficiency
/// purposes.
pub fn vec_from_repr(value: Vec<FsFr>) -> Vec<Self> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Self,
value.len(),
value.capacity(),
)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "allocating via original vector ensures other invariants" means and what invariants there are. Vector storing transparent struct is the same as vector of the internal contents of that struct. And we don't drop anything to cause use after free. What other invariants are there?

Comment on lines 301 to 327
pub fn vec_option_to_repr(value: Vec<Option<Self>>) -> Vec<Option<FsFr>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<FsFr>,
value.len(),
value.capacity(),
)
}
}

/// Convenient conversion from vector of optional underlying representation to scalar for
/// efficiency purposes.
pub fn vec_option_from_repr(value: Vec<Option<FsFr>>) -> Vec<Option<Self>> {
// SAFETY: `Scalar` is `#[repr(transparent)]` and guaranteed to have the same memory
// layout, original vector is not dropped
unsafe {
let mut value = mem::ManuallyDrop::new(value);
Vec::from_raw_parts(
value.as_mut_ptr() as *mut Option<Self>,
value.len(),
value.capacity(),
)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, in general case it is not guaranteed, for example Option<u8> and Option<NonZeroU8> are not symmetric, but in this case it'd be very surprising for invariant to not hold in practice, so I updated description of the safety assumptions as you described. It is possible to break with some exotic compiler options, but it is extremely unlikely to be an issue in practice.

Thanks for careful review!

Comment on lines +30 to +31
#[test]
fn bytes_scalars_conversion() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added constant assertions about size and alignment instead, let me know if that is what you meant

@nazar-pc nazar-pc requested a review from teor2345 October 3, 2024 10:12
…ves-refactoring-part-2

# Conflicts:
#	Cargo.lock
Copy link
Member

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the updates!

I really do wish Rust would guarantee more about the layout of options.

Comment on lines +30 to +31
#[test]
fn bytes_scalars_conversion() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case, I agree that’s enough to detect all potential variations, including field order (which is odd, but specifically documented as something the compiler can do if there are different optimisation settings for different crates). Because the data field has alignment requirements, putting the discriminant before the data would add internal padding, and change the size of the option.

As a counter example, it wouldn’t work for Option<u8>, where we’d need a runtime test that the data bytes matched.

@nazar-pc nazar-pc added this pull request to the merge queue Oct 4, 2024
Merged via the queue into main with commit 2f1914d Oct 4, 2024
11 checks passed
@nazar-pc nazar-pc deleted the subspace-core-primitives-refactoring-part-2 branch October 4, 2024 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants